Distributed Online Learning for Latent Dirichlet Allocation

نویسندگان

  • JinYeong Bak
  • Dongwoo Kim
چکیده

A major obstacle in using Latent Dirichlet Allocation (LDA) is the amount of time it takes for inference, especially for a dataset that starts out large and expands quickly, such as a corpus of blog posts or online news articles. Recent developments in distributed inference algorithms for LDA, as well as minibatchbased online learning algorithms have offered partial solutions for problem. In this paper, we propose a distributed online learning algorithm for LDA for dealing with both aspects of this problem at once. We apply our learning algorithm to a corpus of Twitter conversations and show that it achieves the same model fit within a much shorter learning time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Inference for Latent Dirichlet Allocation

We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic update...

متن کامل

Particle Filter Rejuvenation and Latent Dirichlet Allocation

Previous research has established several methods of online learning for latent Dirichlet allocation (LDA). However, streaming learning for LDA— allowing only one pass over the data and constant storage complexity—is not as well explored. We use reservoir sampling to reduce the storage complexity of a previously-studied online algorithm, namely the particle filter, to constant. We then show tha...

متن کامل

Online Learning for Latent Dirichlet Allocation

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, ...

متن کامل

Scalable Inference for Latent Dirichlet Allocation

We investigate the problem of learning a topic model – the well-known Latent Dirichlet Allocation – in a distributed manner, using a cluster of C processors and dividing the corpus to be learned equally among them. We propose a simple approximated method that can be tuned, trading speed for accuracy according to the task at hand. Our approach is asynchronous, and therefore suitable for clusters...

متن کامل

Course Content Analysis: An Initiative Step toward Learning Object Recommendation Systems for MOOC Learners

With the accelerating development of open education, lowcost online learning resources, such as Massive Open Online Courses (MOOCs), are reaching a wide audience around the world. However, when faced with these appealing but overwhelming learning resources, learners are prone making rash learning decisions, which may be either excessive or insufficient to their learning capacities. To avoid the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012